NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Safe Multi-Agent Learning via Shielding in Decentralized Environments

Melcer, Daniel (June 2025, Doctoral Consortium of AAMAS '25: the 24th International Conference on Autonomous Agents and Multiagent Systems)

Multi-Agent Reinforcement Learning can be used to learn solutions for a wide variety of tasks, but there are few safety guarantees about the policies that the agents learn. My research addresses the challenge of ensuring safety in communication-free multi-agent environments, using shielding as the primary tool. We introduce methods to completely prevent safety violations in domains for which a model is available, in both fully observable and partially observable environments. We present ongoing research on maximizing safety in environments for which no model is available, utilizing a centralized training, decentralized execution framework, and discuss future lines of research.
more » « less
Full Text Available
Safe Multi-Agent Learning via Shielding in Decentralized Environments

Melcer, Daniel (June 2025, Doctoral Consortium of AAMAS '25: the 24th International Conference on Autonomous Agents and Multiagent Systems)

Full Text Available
Learned Shields for Multi-Agent Reinforcement Learning

Melcer, Daniel; Amato, Christopher; Tripakis, Stavros (May 2025, https://ala-workshop.github.io/)

Shielding is an effective method for ensuring safety in multi-agent domains; however, its applicability has previously been limited to environments for which an approximate discrete model and safety specification are known in advance. We present a method for learning shields in cooperative fully-observable multi-agent environments where neither a model nor safety specification are provided, using architectural constraints to realize several important properties of a shield. We show through a series of experiments that our learned shielding method is effective at significantly reducing safety violations, while largely maintaining the ability of an underlying reinforcement learning agent to optimize for reward.
more » « less
Full Text Available
Learned Shields for Multi-Agent Reinforcement Learning

Melcer, Daniel; Tripakis, Stavros; Amato, Christopher (April 2025, https://openreview.net/forum?id=DXHxmyq5kO)

Shielding is an effective method for ensuring safety in multi-agent domains; however, its applicability has previously been limited to environments for which an approximate discrete model and safety specification are known in advance. We present a method for learning shields in cooperative fully-observable multi-agent environments where neither a model nor safety specification are provided, using architectural constraints to realize several important properties of a shield. We show through a series of experiments that our learned shielding method is effective at significantly reducing safety violations, while largely maintaining the ability of an underlying reinforcement learning agent to optimize for reward.
more » « less
Full Text Available
Shield Decentralization for Safe Reinforcement Learning in General Partially Observable Multi-Agent Environments

Melcer, Daniel; Amato, Christopher; Tripakis, Stavros (August 2024, https://www.ifaamas.org/Proceedings/aamas2024/)

Full Text Available
Shield Decomposition for Safe Reinforcement Learning in General Partially Observable Multi-Agent Environments

Melcer, Daniel; Amato, Christopher; Tripakis, Stavros (August 2024, Reinforcement Learning Journal, RLJ 2024: Volume 1, Issue (Number) 1)

Full Text Available
Shield Decentralization for Safe Multi-Agent Reinforcement Learning

Melcer, Daniel; Amato, Christopher; Tripakis, Stavros (January 2022, Advances in neural information processing systems)

Learning safe solutions is an important but challenging problem in multi-agent reinforcement learning (MARL). Shielded reinforcement learning is one approach for preventing agents from choosing unsafe actions. Current shielded reinforcement learning methods for MARL make strong assumptions about communication and full observability. In this work, we extend the formalization of the shielded reinforcement learning problem to a decentralized multi-agent setting. We then present an algorithm for decomposition of a centralized shield, allowing shields to be used in such decentralized, communication-free environments. Our results show that agents equipped with decentralized shields perform comparably to agents with centralized shields in several tasks, allowing shielding to be used in environments with decentralized training and execution for the first time.
more » « less
Full Text Available

Search for: All records